Articles: Bootstrapping Distributional Feature Vector Quality

نویسندگان

Maayan Zhitomirsky-Geffet

Ido Dagan

چکیده

This article presents a novel bootstrapping approach for improving the quality of feature vector weighting in distributional word similarity. The method was motivated by attempts to utilize distributional similarity for identifying the concrete semantic relationship of lexical entailment. Our analysis revealed that a major reason for the rather loose semantic similarity obtained by distributional similarity methods is insufficient quality of the word feature vectors, caused by deficient feature weighting. This observation led to the definition of a bootstrapping scheme which yields improved feature weights, and hence higher quality feature vectors. The underlying idea of our approach is that features which are common to similar words are also most characteristic for their meanings, and thus should be promoted. This idea is realized via a bootstrapping step applied to an initial standard approximation of the similarity space. The superior performance of the bootstrapping method was assessed in two different experiments, one based on direct human gold-standard annotation and the other based on an automatically created disambiguation dataset. These results are further supported by applying a novel quantitative measurement of the quality of feature weighting functions. Improved feature weighting also allows massive feature reduction, which indicates that the most characteristic features for a word are indeed concentrated at the top ranks of its vector. Finally, experiments with three prominent similarity measures and two feature weighting functions showed that the bootstrapping scheme is robust and is independent of the original functions over which it is applied.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrapping Distributional Feature Vector Quality

متن کامل

Feature Vector Quality and Distributional Similarity

We suggest a new goal and evaluation criterion for word similarity measures. The new criterion meaning-entailing substitutability fits the needs of semantic-oriented NLP applications and can be evaluated directly (independent of an application) at a good level of human agreement. Motivated by this semantic criterion we analyze the empirical quality of distributional word feature vectors and its...

متن کامل

Trained Named Entity Recognition using Distributional Clusters

This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recognition. The default feature set of BWI is augmented with features based on distributional term clusters induced from a large unlabeled text corpus. Using no traditional linguistic resources, such as syntactic tags or speci...

متن کامل

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

This work develops formal statistical inference procedures for predictions generated by supervised learning ensembles. Ensemble methods based on bootstrapping, such as bagging and random forests, have improved the predictive accuracy of individual trees, but fail to provide a framework in which distributional results can be easily determined. Instead of aggregating full bootstrap samples, we co...

متن کامل

Yapay biçimbilimsel özniteliklerin hiperspektral görüntü sınıflandırma başarımının üzerindeki etkisi On the effect of synthetic morphological feature vectors on hyperspectral image classification performance

This paper studies the effect of synthetic feature vectors on the classification performance of hyperspectral remote sensing images. As feature vectors, it has been chosen to employ morphological attribute profiles, that have proven themselves in this field. At this early stage of our work, the relatively simple Bootstrapping algorithm has been used for synthetic feature vector generation. Base...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Articles: Bootstrapping Distributional Feature Vector Quality

نویسندگان

چکیده

منابع مشابه

Bootstrapping Distributional Feature Vector Quality

Feature Vector Quality and Distributional Similarity

Trained Named Entity Recognition using Distributional Clusters

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

Yapay biçimbilimsel özniteliklerin hiperspektral görüntü sınıflandırma başarımının üzerindeki etkisi On the effect of synthetic morphological feature vectors on hyperspectral image classification performance

عنوان ژورنال:

اشتراک گذاری